Efficient Name Variation Detection

نویسنده

  • L. Karl Branting
چکیده

Semantic integration, link analysis and other forms of evidence detection often require recognition of multiple occurrences of a single name. However, names frequently occur in orthographic variations resulting from phonetic variations and transcription errors. The computational expense of similarity assessment algorithms usually precludes application to all pairs of strings. Instead, it is typically necessary to use a high-recall, lowprecision index to retrieve a smaller set of candidate matches to which the similarity assessment algorithm is then applied. This paper describes five algorithms for efficient candidate retrieval: Burkhart-Keller trees (BKT); filtered Burkhart-Keller trees (FBKT); partition filtering; ngrams; and Soundex. An empirical evaluation showed that no single algorithm performed best under all circumstances. When the source of name variations was purely orthographic, partition filtering generally performed best. When similarity assessment was based on phonetic similarity and the phonetic model was available, BKT and FBKT performed best. When the pronunciation model was unavailable, Soundex was best for k=0 (homonyms), and partition filtering or BKT were best for k>0. Unfortunately, the high-recall retrieval algorithms were multiple orders of magnitude more costly than the low-recall algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A crack localization method for beams via an efficient static data based indicator

In this paper, a crack localization method for Euler-Bernoulli beams via an efficient static data based indicator is proposed. The crack in beams is simulated here using a triangular variation in the stiffness. Static responses of a beam are obtained by the finite element modeling. In order to reduce the computational cost of damage detection method, the beam deflection is fitted through a poly...

متن کامل

A New Dictionary Construction Method in Sparse Representation Techniques for Target Detection in Hyperspectral Imagery

Hyperspectral data in Remote Sensing which have been gathered with efficient spectral resolution (about 10 nanometer) contain a plethora of spectral bands (roughly 200 bands). Since precious information about the spectral features of target materials can be extracted from these data, they have been used exclusively in hyperspectral target detection. One of the problem associated with the detect...

متن کامل

Compressed Domain Scene Change Detection Based on Transform Units Distribution in High Efficiency Video Coding Standard

Scene change detection plays an important role in a number of video applications, including video indexing, searching, browsing, semantic features extraction, and, in general, pre-processing and post-processing operations. Several scene change detection methods have been proposed in different coding standards. Most of them use fixed thresholds for the similarity metrics to determine if there wa...

متن کامل

Ranking efficient DMUs using the variation coefficient of weights in DEA

One of the difficulties of Data Envelopment Analysis(DEA) is the problem of deciency discriminationamong efficient Decision Making Units(DMUs) and hence, yielding large number of DMUs as efficientones. The main purpose of this paper is to overcome this inability. One of the methods for rankingefficient DMUs is minimizing the Coefficient of Variation (CV) for inputs-outputs weights. In this pape...

متن کامل

Fault Detection Method on a Compressor Rotor Using the Phase Variation of the Vibration Signal

The aim of this work is the application of the phase variation in vibration signal for fault detection on rotating machines. The vibration signal from the machine is modulated in amplitude and phase around a carrier frequency. The modulating signal in phase is determined after the Hilbert transform and is used, with the Fast Fourier Transform, to extract the harmonics spectrum in phase. This me...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006